Model Selection

High-Resolution Processing

# High-Resolution Processing

C-RADIOv2 is a visual feature extraction model developed by NVIDIA, offering multiple size versions suitable for image understanding and dense visual tasks.

Vit So400m Patch14 Siglip Gap 896.pali Pt

Vision model based on SigLIP image encoder, employing global average pooling, part of the PaliGemma project

Mini InternVL2 1B DA DriveLM

Mini-InternVL2-DA-RS is a multimodal model optimized for the remote sensing image domain, based on the Mini-InternVL architecture. It has been fine-tuned through a domain adaptation framework and demonstrates excellent performance in remote sensing image understanding tasks.

Transformers Other

Timesformer Hr Finetuned K600

TimeSformer-HR is a video action recognition model optimized for high-resolution videos and fine-tuned on the Kinetics-600 dataset.

Video Processing

A visual feature extraction model developed by NVIDIA for generating image embeddings, supporting downstream tasks such as image classification.

Aesthetic Shadow

Aesthetic Shadow is a Vision Transformer model with 1.1 billion parameters, specifically designed for evaluating anime image quality.

Image Classification

Segformer B4 City Satellite Segmentation 1024x1024

A satellite image segmentation model based on the SegFormer architecture, specifically designed for urban area segmentation tasks

Image Segmentation

Efficientnet B6

EfficientNet is a mobile-friendly pure convolutional model that uniformly scales depth/width/resolution dimensions through compound coefficients, trained on the ImageNet-1k dataset.

Image Classification

Timesformer Hr Finetuned Ssv2

TimeSformer is a video classification model based on spatio-temporal attention mechanism, fine-tuned on the Something Something v2 dataset.

Video Processing

Timesformer Hr Finetuned K600

TimeSformer is a video understanding model based on spatiotemporal attention mechanisms, with its high-resolution variant specifically fine-tuned for the Kinetics-600 dataset.

Video Processing

Timesformer Hr Finetuned Ssv2

TimeSformer is a video understanding model based on spatio-temporal attention mechanisms. This version is a high-resolution variant fine-tuned on the Something Something v2 dataset.

Video Processing

Timesformer Hr Finetuned K400

TimeSformer is a video understanding model based on spatio-temporal attention mechanisms, pre-trained and fine-tuned on the Kinetics-400 dataset.

Video Processing

Beit Base Finetuned Ade 640 640

BEiT is a model based on the Vision Transformer (ViT) architecture, pre-trained on ImageNet-21k through self-supervised learning and fine-tuned on the ADE20k dataset, specifically designed for image semantic segmentation tasks.

Image Segmentation

Segformer B0 Finetuned Cityscapes 640 1280

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the Cityscapes dataset, suitable for road scene segmentation tasks.

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase